Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
基础模型(FMS)已证明了前所未有的功能,包括零拍学习,高保真数据合成和范围内的概括。但是,正如我们在本文中所显示的那样,FMS在专家任务上的开箱即用表现较差(例如,从语言查询中检索汽车手册技术插图),数据是看不见的,或者属于长尾的数据用于FM预训练的大型数据集的数据分布的一部分。这强调了在此类专家任务上明确评估和芬太尼FMS的必要性,这可以说是在实际现实世界中最重要的任务。在本文中,我们提出了围绕教授FMS了解技术文档的任务,通过学习将其图形插图与相应的语言描述相匹配的任务围绕着了解技术文档的任务。我们的FETA基准重点是公共汽车手册和销售目录手册中的文本对图像和图像到文本检索。 FETA配备了完全自动注释提取的程序(接受后将发布代码),从而使Feta轻松扩展到将来更多的文档类型和应用域。我们的自动注释导致自动性能指标显示,该指标与在人类策划注释中计算的指标一致(也发布)。我们提供多个基线和对FETA的流行FM的分析,从而导致一些有趣的发现,我们认为这对FM社区非常有价值,为现实世界中FMS应用于当前被标准基准的“忽视”的实践专家任务铺平了道路。在常见对象上。
translated by 谷歌翻译
概括跨越不同视觉域的学习表现的能力,例如在真正的照片,剪贴画,绘画和草图之间是人类视觉系统的基本容量。在本文中,不同于利用一些(或全部)源域监控的大多数跨域工作,我们接近一个相对较新的,非常实用的无监督域泛化(UDG)设置在既不源也不在源域中没有培训监督。我们的方法是基于跨域(BRAD)的桥梁​​的自我监督学习 - 辅助桥域附有一组从每个训练域的Brad将视觉(图像到图像)映射保留的一组语义。 BRAD和MAPPAPAPPED(端到端)与对比的自我监督表示模型一起学习(端到端),其用语义对齐每个域将每个域对齐,因此隐含地驱动所有域(见或看不见)语义上彼此对齐。在这项工作中,我们展示了如何使用边缘正则化的布拉德,我们的方法在多个基准和一系列任务中实现了显着的增益,包括UDG,少量UDA和跨多个域数据集的无监督概括(包括指向未经看明域的概念和课程)。
translated by 谷歌翻译
Most cross-domain unsupervised Video Anomaly Detection (VAD) works assume that at least few task-relevant target domain training data are available for adaptation from the source to the target domain. However, this requires laborious model-tuning by the end-user who may prefer to have a system that works ``out-of-the-box." To address such practical scenarios, we identify a novel target domain (inference-time) VAD task where no target domain training data are available. To this end, we propose a new `Zero-shot Cross-domain Video Anomaly Detection (zxvad)' framework that includes a future-frame prediction generative model setup. Different from prior future-frame prediction models, our model uses a novel Normalcy Classifier module to learn the features of normal event videos by learning how such features are different ``relatively" to features in pseudo-abnormal examples. A novel Untrained Convolutional Neural Network based Anomaly Synthesis module crafts these pseudo-abnormal examples by adding foreign objects in normal video frames with no extra training cost. With our novel relative normalcy feature learning strategy, zxvad generalizes and learns to distinguish between normal and abnormal frames in a new target domain without adaptation during inference. Through evaluations on common datasets, we show that zxvad outperforms the state-of-the-art (SOTA), regardless of whether task-relevant (i.e., VAD) source training data are available or not. Lastly, zxvad also beats the SOTA methods in inference-time efficiency metrics including the model size, total parameters, GPU energy consumption, and GMACs.
translated by 谷歌翻译
Transformer layers, which use an alternating pattern of multi-head attention and multi-layer perceptron (MLP) layers, provide an effective tool for a variety of machine learning problems. As the transformer layers use residual connections to avoid the problem of vanishing gradients, they can be viewed as the numerical integration of a differential equation. In this extended abstract, we build upon this connection and propose a modification of the internal architecture of a transformer layer. The proposed model places the multi-head attention sublayer and the MLP sublayer parallel to each other. Our experiments show that this simple modification improves the performance of transformer networks in multiple tasks. Moreover, for the image classification task, we show that using neural ODE solvers with a sophisticated integration scheme further improves performance.
translated by 谷歌翻译
Image segmentation is a fundamental task in computer vision. Data annotation for training supervised methods can be labor-intensive, motivating unsupervised methods. Some existing approaches extract deep features from pre-trained networks and build a graph to apply classical clustering methods (e.g., $k$-means and normalized-cuts) as a post-processing stage. These techniques reduce the high-dimensional information encoded in the features to pair-wise scalar affinities. In this work, we replace classical clustering algorithms with a lightweight Graph Neural Network (GNN) trained to achieve the same clustering objective function. However, in contrast to existing approaches, we feed the GNN not only the pair-wise affinities between local image features but also the raw features themselves. Maintaining this connection between the raw feature and the clustering goal allows to perform part semantic segmentation implicitly, without requiring additional post-processing steps. We demonstrate how classical clustering objectives can be formulated as self-supervised loss functions for training our image segmentation GNN. Additionally, we use the Correlation-Clustering (CC) objective to perform clustering without defining the number of clusters ($k$-less clustering). We apply the proposed method for object localization, segmentation, and semantic part segmentation tasks, surpassing state-of-the-art performance on multiple benchmarks.
translated by 谷歌翻译
In object detection, post-processing methods like Non-maximum Suppression (NMS) are widely used. NMS can substantially reduce the number of false positive detections but may still keep some detections with low objectness scores. In order to find the exact number of objects and their labels in the image, we propose a post processing method called Detection Selection Algorithm (DSA) which is used after NMS or related methods. DSA greedily selects a subset of detected bounding boxes, together with full object reconstructions that give the interpretation of the whole image with highest likelihood, taking into account object occlusions. The algorithm consists of four components. First, we add an occlusion branch to Faster R-CNN to obtain occlusion relationships between objects. Second, we develop a single reconstruction algorithm which can reconstruct the whole appearance of an object given its visible part, based on the optimization of latent variables of a trained generative network which we call the decoder. Third, we propose a whole reconstruction algorithm which generates the joint reconstruction of all objects in a hypothesized interpretation, taking into account occlusion ordering. Finally we propose a greedy algorithm that incrementally adds or removes detections from a list to maximize the likelihood of the corresponding interpretation. DSA with NMS or Soft-NMS can achieve better results than NMS or Soft-NMS themselves, as is illustrated in our experiments on synthetic images with mutiple 3d objects.
translated by 谷歌翻译
Out-of-distribution (OOD) detection has attracted a large amount of attention from the machine learning research community in recent years due to its importance in deployed systems. Most of the previous studies focused on the detection of OOD samples in the multi-class classification task. However, OOD detection in the multi-label classification task remains an underexplored domain. In this research, we propose YolOOD - a method that utilizes concepts from the object detection domain to perform OOD detection in the multi-label classification task. Object detection models have an inherent ability to distinguish between objects of interest (in-distribution) and irrelevant objects (e.g., OOD objects) on images that contain multiple objects from different categories. These abilities allow us to convert a regular object detection model into an image classifier with inherent OOD detection capabilities with just minor changes. We compare our approach to state-of-the-art OOD detection methods and demonstrate YolOOD's ability to outperform these methods on a comprehensive suite of in-distribution and OOD benchmark datasets.
translated by 谷歌翻译
This is a continuation of our recent paper in which we developed the theory of sequential parametrized motion planning. A sequential parametrized motion planning algorithm produced a motion of the system which is required to visit a prescribed sequence of states, in a certain order, at specified moments of time. In the previous publication we analysed the sequential parametrized topological complexity of the Fadell - Neuwirth fibration which in relevant to the problem of moving multiple robots avoiding collisions with other robots and with obstacles in the Euclidean space. Besides, in the preceeding paper we found the sequential parametrised topological complexity of the Fadell - Neuwirth bundle for the case of the Euclidean space $\Bbb R^d$ of odd dimension as well as the case $d=2$. In the present paper we give the complete answer for an arbitrary $d\ge 2$ even. Moreover, we present an explicit motion planning algorithm for controlling multiple robots in $\Bbb R^d$ having the minimal possible topological complexity; this algorithm is applicable to any number $n$ of robots and any number $m\ge 2$ of obstacles.
translated by 谷歌翻译
Domain generalization (DG) aims to train a model to perform well in unseen domains under different distributions. This paper considers a more realistic yet more challenging scenario,namely Single Domain Generalization (Single-DG), where only a single source domain is available for training. To tackle this challenge, we first try to understand when neural networks fail to generalize? We empirically ascertain a property of a model that correlates strongly with its generalization that we coin as "model sensitivity". Based on our analysis, we propose a novel strategy of Spectral Adversarial Data Augmentation (SADA) to generate augmented images targeted at the highly sensitive frequencies. Models trained with these hard-to-learn samples can effectively suppress the sensitivity in the frequency space, which leads to improved generalization performance. Extensive experiments on multiple public datasets demonstrate the superiority of our approach, which surpasses the state-of-the-art single-DG methods.
translated by 谷歌翻译